Hybrid 3&#39; untranslated regions suitable for efficient protein expression in mammalian cells

ABSTRACT

The present invention describes the use of hybrid short 3′ untranslated (3′UTR) regions which are composed of two regions, one region from an 3′ untranslated region of a stable eukaryotic mRNA, and another region from the downstream end of an 3′ untranslated region of another eukaryotic mRNA that contains a polyadenylation (polyA) signal. The use of such hybrid regions allows for an efficient protein expression when used in conjunction with circular or linear expression DNA molecules. The present invention provides a recombinant DNA that is composed of a promoter, a protein coding region, and the hybrid 3′UTR in a continuous and directional orientation. The efficient expression systems of the invention are suitable for the economical and efficient production of therapeutic proteins and for use with both transient and stable expression systems.

FIELD OF INVENTION

The field of invention is expression vector engineering, mammalianprotein expression, recombinant DNA technology, and expression oftherapeutic proteins.

BACKGROUND OF INVENTION

Eukaryotic messenger RNA (mRNA) contains three regions, 5′ untranslatedregion (5′UTR), protein coding region, and 3′ untranslated region(3′UTR). The 3′UTR is the sequence towards the 3′ end of the mRNA andcontains a polyadenylation signal and a polyadenylation site. The 3′UTRis an important regulatory element in many instances it dictates themRNA stability and it can also regulate translation efficiency. Forexample, AU-rich elements tend to destabilize mRNAs (Dalphin et al.1999). The mRNAs can be classified into three categories based on theirhalf-life: (a) unstable mRNAs with a short half life, for example, lessthan 2 hours, and (b) intermediately stable mRNAs with a half life thatexceeds, for example, 2 hours, and (c) stable mRNAs with a half lifethat exceeds 8 hours. Many of the housekeeping genes tend to code forstable mRNAs, such as but not limited to globin mRNAs (Chen and Shyu1994; Shaw and Kamen 1986), β-actin, ribosomal proteins, andglyceraldehyde 3-phosphate dehydrogenase (GAPDH).

Expression vectors are important research and biotechnology tools. Inresearch they are used to study protein function while in industry theyare used to produce (therapeutic) proteins. Optimization of expressionvectors for efficient protein production has been sought and practicedin many ways, mostly by optimizing strong promoters that lead to anefficient transcription of the mRNA encoded by the protein coding regionin the expression vector. Examples of these are the CMV immediate earlypromoter, SV40 promoter, elongation factor (EF) promoter, and chickenβ-actin promoter (Foecking and Hofstetter 1986; Kobayashi et al. 1997).

Likewise, strong polyA signal cassettes were also used to augment theprotein expression by providing strong polyadenylation signals thatstabilize mRNAs which are produced from the expression vectors. Amongthe widely used polyA signals are the bovine growth hormone (bGH) polyAsignal (U.S. Pat. No. 5,122,458 by Post et al.: Use of a bGH GNApolyadenylation signal in expression of non-bGH polypeptides in highereukaryotic cells) and SV40 polyA signal (Goodwin and Rottman 1992).Unlike termination signals in bacteria, where the 3′ end of the mRNA isformed by the NA polymerase that is simply dropping off the DNA andceasing transcription, in eukaryotes cleavage occurs that is followed bybinding of the polyadenylation complex to an AAUAAA sequence near theend of the mRNA. This complex contains an endonuclease that cuts the RNAabout 14-30 bases downstream of the AAUAAA sequence and a polymerasethat adds a string of polyA forming the polyA tail (Wigley et al. 1990).

Since the 3′ UTR and polyA signal context sequence may influencenuclease degradation of plasmid DNA vectors particularly after deliveryand during trafficking to the nucleus, a novel approach that circumventsthis problem is described. In this invention, an approach to furtheroptimize expression of proteins is described which uses a hybrid 3′UTRdownstream of the protein coding region that consists of two regions,wherein one region is from an 3′UTR of a stable mRNA and the otherregion is a poly A signal containing region from an 3′UTR of anothermRNA.

SUMMARY OF INVENTION

The present invention describes the construction of hybrid short 3′untranslated (3′UTR) regions for use in mammalian expression vectors toboost the expression of proteins encoded in these vectors. The hybridshort 3′UTR comprises two regions that are near each other, i.e. inproximity to each other or adjacent to each other or overlapping witheach other or one encompasses the other, one region from an 3′untranslated region of a stable eukaryotic NA, and another region fromthe downstream end of an 3′ untranslated region of a stable eukaryoticmRNA which contains the polyadenylation signal. The first region doesnot contain a polyadenylation signal, whereas the second region containsa polyadenylation signal and is derived from the downstream end of an 3′untranslated region of a stable eukaryotic mRNA.

In a preferred embodiment, the first region or the two regions comprisethe 3′UTR or part of the 3′UTR of the human eukaryotic translationelongation factor 1 alpha 1 EEF1A1).

The present invention further provides a recombinant DNA that iscomposed of promoter, protein coding region, and the hybrid 3′UTR, in acontinuous and directional orientation.

BRIEF DESCRIPTION OF FIGURES

FIG. 1. Construction of the hybrid 3′UTR construct.

A vector containing CMV/Itron A promoter, a coding region for greenfluorescence protein (GFP), followed by an 3′UTR containing aneukaryotic polyadenylation signal (AATAAA), is modified to include partof the 3′UTR of the human housekeeping gene, human eukaryotictranslation elongation factor 1 alpha 1 (EEF1A1). The EEF1A region wascloned into the original 3′UTR region using restriction site cloning andthe result is a hybrid 3′UTR region. The upper panel A shows theoriginal construct and the lower panel B shows the new construct withthe hybrid 3′UTR.

FIG. 2. GFP fluorescence due to protein expression in HEK 293 cellsusing the unmodified vector (A) and vector containing the hybrid 3′UTR(B).

FIG. 3. GFP fluorescence due to protein expression in two different celltypes, HeLa (A and B) and Huh7 liver cell line (C and D) using theunmodified vector (A and C) and vector containing the hybrid 3′UTR (Band D).

FIG. 4. GFP fluorescence due to protein expression in HEK 293 cellsusing the unmodified vector (A) and linear DNA containing the hybrid3′UTR in form of a PCR product (B).

DETAILED DESCRIPTION OF INVENTION

As discussed above, there is a need to improve and optimize proteinexpression. Therefore, the problem to be solved by this invention, wasto further improve and optimize protein expression systems, such as byimproving and optimizing the engineering of expression vectors and/or oftheir components.

The problem is solved by the present invention by providing arecombinant nucleic acid that comprises a first and a second nucleicacid region, wherein

-   -   (a) said first region is derived from an 3′ untranslated region        (3′UTR) of a stable eukaryotic mRNA,    -   (b) said first region does not contain a polyadenylation signal,    -   (c) said second region is derived from the downstream end of an        3, untranslated region of another or the same stable eukaryotic        mRNA as in (a), and    -   (d) said second region contains a polyadenaylation signal.

With the proviso that said first and second nucleic acid regions whentaken together form a hybrid 3′UTR, said hybrid 3′UTR being differentfrom its naturally occurring, i.e. original, counterpart. This provisomeans, for example, when said first region is derived from the upstreamend of a 3′-UTR of a stable eukaryotic mRNA and said second region isderived from the downstream end of the 3′UM of the same stableeukaryotic mRNA, the hybrid 3′UTR is different from the naturallyoccurring, i.e. original, 3′UTR.

By “nucleic acid” is meant a single-stranded or double-stranded chain oftwo or more nucleotide bases including, without limitation,deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and analogs ofeither DNA or RNA, mRNA, and cDNA, such as PNA. The recombinant nucleicacid of the present inventions are preferably DNA, RNA, or PNA.

By the term a region is “derived from” an 3′ untranslated region(3′-UTR) of a stable eukaryotic mRNA is meant that the region is the3′UTR itself is a part of said 3′UTR or a sequence of the 3′UTRcomprising mutations, deletions, truncations, insertions and/or singlenucleotide polymorphisms (SNPs).

In a preferred embodiment recombinant nucleic acids are provided,wherein said first region and said second region of the recombinantnucleic acid are in proximity to each other or wherein said first regionand said second region are adjacent to each other or wherein said firstregion and said second region overlap with each other or wherein oneregion encompasses the other region.

Preferred proximities of the first and second region, i.e. distancesbetween the first and second region, are in the range of 0 to 500nucleotides, more preferred in the range of 0 to 250 nucleotides andmost preferred in the range of 0 to 100 nucleotides.

It is preferred that the 3, end of the first region is adjacent to the5′ end of the second region or that the 3′ end of the second region isadjacent to the 5′ end of the first region.

It is further preferred that the first region is located with the secondregion, wherein the first region is preferably located upstream of thepolyadenylation signal of the second region.

It is preferred that at least one of the regions is derived from stablemRNAs, such as those of housekeeping genes which exist in abundantamounts inside the cells. As already defined above, stable mRNAs aremRNAs with a half life that exceeds 8 hours.

Preferred housekeeping and/or stable mRNAs are, but are not limited toβ-actin, α- and β-globins, glyceraldehyde 3-phosphate dehydrogenase(GAPDH), growth hormone, eukaryotic translation elongation factor 1alpha 1 (EEF1A1), and many of the ribosomal proteins. Other housekeepinggenes are known in the art, see for example Eisenberg and Levanon 2003.It is preformed that the first region and/or the second region isderived from the 3′UTR of housekeeping genes that are abundant (seeTable 1).

Since the length of the 3′UTR of housekeeping genes is generallyshorter, see Eisenberg and Levanon, 2003, which is also the result ofour analysis, see Example 1, the preferred length of the total hybrid3′UTR according to the invention is less than 1000 nucleotides,preferably less than 500 nucleotides.

In a preferred embodiment the first region is derived from the 3′UTR ofeukaryotic translation elongation factor 1 alpha 1 (EEF1A1), see alsoFIG. 1 and Table 2. In a more preferred embodiment the first region isthe 3′UM of eukaryotic translation elongation factor 1 alpha 1 (EEF1A1)or a part thereof.

In another preferred embodiment the first and the second region isderived from the 3′UTR of eukaryotic translation elongation factor 1alpha 1 (EEF1A1).

In another preferred embodiment, the first region is derived from the3′UTR of EEF1A1 and the second region is derived from 3′UTR of bovinegrowth hormone (bGH), preferably a smaller region of the bovine growthhormone 3′UTR that ter contains an efficient polyadenylation signal.

In another aspect of the present invention a recombinant nucleic acid isprovided that comprises a first and a second nucleic acid region,wherein both regions are derived from the 3′UTR of the same stableeukaryotic mRNA, preferably from the 3′UTR of eukaryotic translationelongation factor 1 alpha 1 (EEF1A1) or a part thereof. Furthermore, inthis aspect of the present invention the above stated proviso does notapply. In a preferred embodiment of this aspect of the present inventionthe first or second region is identical to the 3′UTR of EEF1A1 or one ormore parts thereof. In another preferred embodiment of this aspect ofthe present invention the first and second region when taken togetherare identical to the 3′-UTR of EEF1A1 or one or more parts thereof.

The problem is further solved by the present invention by providing afurther recombinant nucleic acid comprising the following components in5, to 3′ direction:

-   -   (i) a promoter sequence,    -   (ii) a protein coding region sequence, and    -   (iii) the hybrid 3′UTR of the present invention as defined        above, which is operably lined to the protein coding sequence of        (ii).

In another aspect of the present invention the further recombinantnucleic acid comprises the following components in 5′ to 3′ direction:

-   -   (i) a promoter sequence,    -   (ii) a protein coding region sequence, and    -   (iii) the 3′UTR of EEF1A1 or parts thereof as defined above,        which is operably linked to the protein coding sequence of (ii).

By “operably linked” is meant that the nucleic acid sequence encoding aprotein of interest and (transcriptional) regulatory sequences, such asthe 3′UTR, are connected in such a way as to permit expression of thenucleic acid sequence in vivo, such as when introduced into a cell.Preferably, the hybrid 3′UTR of the present invention is operably linedto the protein coding sequence in a continuous (uninterrupted) anddirectional orientation, such as adjacent to the protein codingsequence.

In a preferred embodiment the recombinant nucleic acid of the inventionis a linear DNA molecule. Such linear DNA molecules are preferablygenerated by PCR using at least two primers that specifically hybridizeto regions near the 5′ end and specifically hybridize to regions nearthe 37 end of said nucleic acid which is comprised in an expressionvector.

The problem is further solved by the present invention by providing anexpression vector comprising the nucleic acid according to the presentinvention.

It is preferred that the expression vector further comprises a selectionmarker. Selection markers are known in the art however, preferredselection markers are neomycin resistance gene and blasticidinresistance gene.

The protein coding sequences are either coding for a reporter gene ormore preferably a therapeutic protein. Preferred reporter genes are GFP,luciferase, or other reporter genes known in the art. Preferredtherapeutic proteins are interferones, growth factors, anti-angiogenesisproteins, apoptosis modulating proteins, tumor growth factors or spreadsuppressing factors, vaccines, recombinant antibodies, or any othercurrent or future therapeutic protein.

The nucleic acids according to the present invention are preferablyproduced in linear form by a method that comprises the following stepsof

-   -   (a) providing an expression vector comprising a nucleic acid        according to the present invention, wherein said nucleic acid        comprises the following components in 57 to 3′ direction:        -   (i) a promoter sequence,        -   (ii) a protein coding region sequence, and        -   (iii) the hybrid 3′UTR of the present invention as defined            above, which is operably linked to the protein coding            sequence of (ii),    -   (b) providing at least two primers, wherein one primer        specifically hybridizes to regions near the 5, end of the        nucleic acid according to the present invention, and        wherein the other primer specifically hybridizes to regions near        the 3′ end of the nucleic acid according to the present        invention,        wherein said nucleic acid comprises the following components in        5, to 3′ direction:    -   (i) a promoter sequence,    -   (ii) a protein coding region sequence, and    -   (iii) the hybrid 3′UTR of the present invention as defined        above, which is operably linked to the protein coding sequence        of (ii),    -   (c) performing a PCR, and    -   (d) obtaining the amplification product of step (c).

In another aspect of the invention the nucleic acids according to thepresent invention are preferably produced in linear form by a methodthat comprises the following steps of

-   -   (a) providing an expression vector comprising a nucleic acid        according to the present invention, wherein said nucleic acid        comprises the following components in 5′ to 3, direction:        -   (i) a promoter sequence,        -   (ii) a protein coding region sequence, and        -   (iii) the 3′UTR of EEF1A1 or parts thereof as defined above,            which is operably linked to the protein coding sequence of            (ii),    -   (b) providing at least two primers, wherein one primer        specifically hybridizes to regions near the 5′ end of the        nucleic acid according to the present invention, and wherein the        other primer specifically hybridizes to regions near the 3′ end        of the nucleic acid according to the present invention,    -   wherein said nucleic acid comprises the following components in        5′ to 37 direction:        -   (i) a promoter sequence,        -   (ii) a protein coding region sequence, and        -   (iii) the 3′UTR of EEF1A1 or parts thereof as defined above,            which is operably linked to the protein coding sequence of            (ii)    -   (c) performing a PCR, and    -   (d) obtaining the amplification product of step (c).

The problem is further solved by the present invention by providing ahost cell characterized in that it contains the expression vector of thepresent invention or the recombinant linear nucleic acid of the presentinvention, and further characterized in that it transiently or stablyexpresses the protein encoded in the expression vector of the presentinvention or the recombinant linear nucleic acid of the presentinvention.

Transient and stable expression of proteins is known in the art. Whenproteins are expressed transiently a foreign gene that codes for theparticular protein is expressed by recipient/host cells over arelatively brief time span, wherein the gene is not integrated into thegenome of the host cell, whereas in case of stable expression ofproteins the foreign coding gene is integrated into the genome of thehost cell.

The host cells of the present invention are preferably obtained by invivo injection of the expression vector of the present invention or therecombinant linear nucleic acid of the present invention into cells.

A preferred method for obtaining a host cell of the present inventioncomprises the following steps of

-   -   (a) providing an expression vector of the present invention or a        recombinant linear nucleic acid of the present invention,    -   (b) in vivo injection of the expression vector of the present        invention or the recombinant linear nucleic acid of the present        invention into cells,    -   (c) obtaining the injected cells of step (b).

Suitable cells are known in the art. However, preferred cells are HEK293, HeLa, Huh7, COS-7, and CHO cell lines.

A preferred embodiment is a method for expressing proteins, comprisingproviding and culturing of host cells of the present invention, underconditions allowing transient or stable expression of proteins, andobtaining said expressed proteins.

Culturing conditions of cells and cell culturing conditions that allowthe expression of proteins are known in the art. Preferred conditionsare at temperature of about 37° C. and a humidified atmosphere, a mediumcontaining buffers with a salt concentration and a pH1 suitable formammalian cell culture.

The proteins can be either expressed in a secreted form or in cellularcompartments. The expressed proteins can further be obtained by methodsfor isolating and purifying proteins from cells, which are known in theart. Preferred methods for purification are affinity chromatography,immuno-affinity chromatography, protein precipitation, buffer exchanges,ion-exchange chromatography, hydrophobic interaction chromatography,size-exclusion chromatography and electrophoresis-based purification.

Mammalian expression vectors are powerful research and industrialbiotechnology tools. They comprise as a minimum requirement aneukaryotic promoter, a protein coding sequence, and an 3′UTR thatcontains an efficient polyadenylation signal. Many attempts ofenchancing protein expression from vectors have been focused onpromoters. In this invention, enhanced efficiency of protein expressionwas achieved by modifying the 3′-UTR that is continuously anddirectionally operable linked to the protein coding sequence.

The 3′UTRs which include the polyadenylation signal and upstream anddownstream polyadenylation context sequences can be rendered efficientby making a hybrid 3′UTR consisting of two regions: a first region froman 3′ untranslated region of a stable eukaryotic mRNA that does notcontain a polyadenylation signal, and a second region from thedownstream end of an 3′ untranslated region of a stable eukaryotic mRNAthat contains a polyadenylation signal.

The sequences can be obtained by amplification of the said region fromthe 3′UTR of the housekeeping mRNA using RT-PCR utilizing two primersspecific to the region of interest. These PCR products are ten clonedinto a vector, such as a plasmid or viral vector, which contains alreadythe second region. Preferred examples for vectors are pUC 19 basedplasmids, pcDNA3.1, Gwiz, CMVSport, etc. which are available fromdifferent research tool companies, such as invitrogen, Clontech,Strategene and Gene Therapy Systems, inc. There are different preferredways of cloning, such as restriction site-directed cloning, bluntcloning, or recombination-based cloning. The hybrid USR can preferablybe created by conventional cloning techniques involving restrictionenzyme digestion of commercially available plasmids and cDNA molecules,or can be synthesized using PCR or an automated DNA synthesizer usingmethods known in the art.

It has been shown that a large fraction of human polyadenylation sitesis flanked by U-rich elements, both upstream and downstream of thecleavage site, located around positions 0 to -50 and +20 to +60,relative to the polyA signal (Legendre and Gautheret 2003) Thus, it islikely that the enhanced efficiency of protein expression which isachieved by the present invention is caused by the presence ofadditional strong upstream sequences in the 3′UTR of EEF1A1.

Once the expression vectors containing the hybrid molecules aregenerated they can be transfected into cells by any known transfectionmethod. Alternatively, linear fragments can be generated by PCR orsynthesized genes that contain the minimum linear cassette containingthe promoter, coding region, and the hybrid 3′UTR.

EXAMPLES

The following examples are intended for illustration purposes only, andshould not be construed as limiting the scope of the present inventionin any way.

Example 1

We used the list of housekeeping genes that were identified by Eisenbergand Levanon (2003) based on constitutive expression in all tissues. Weassumed that a housekeeping gene with an abundance of more than 1000ESTs (total number of expressed sequence tags in EST database perexpressed genes) is stable because abundant transcripts are likely to beassociated with stable mRNAs. The abundance data were obtained from thedbEST, a database of expressed sequence tags that is publicly availablein the National Center for Biotechnology Information (NCBI, USA). Table1 below shows the list of these abundant housekeeping genes and thislist also constitutes the preferred sequence data source for the hybrid3′UTR used in this invention. When compared to our list of unstable mRNA(derived from AU-rich mRNA which tend to code for unstable mRNAs), wefound different statistical differences. The average abundance ofhousekeeping mRNAs was 1151±78 (SEAM n=570), i.e. higher than those ofunstable mRNAs (by 5-fold) which had an average abundance of 220±16(n=266). The average length of the 3′UTR of stable mRNAs appears shorterthan those of unstable mRNAs (AU-rich mRNAs). The average length of3′UTR of the housekeeping mRNA was 676±30 nucleotides (SEM, n±572) whilethe average length of the 3′UTR of abundant housekeeping mRNA, i.e.,those with more than 1000 ESTs, was 570±50 nucleotides (SEM, n=178). Incontrast, the average length of unstable AU-rich mRNAs was more than1560±39 nucleotides (SEM, n=1027).

TABLE 1 List of Abundant Housekeeping Genes Acc Definition Symbol^(a)Length^(b) Abundance^(c) NM_001402 Eukaryotic translation elongationfactor 1 alpha 1 EEF1A1 387 20011 NM_001614 Actin, gamma 1 ACTG1 71816084 NM_002046 Glyceraldehyde-3-phosphate dehydrogenase GAPD 201 15931NM_001101 Actin, beta ACTB 593 15733 NM_000967 Ribosomal protein L3 RPL374 10924 NM_006082 Tubulin, alpha, ubiquitous K-ALPHA-1 174 10416NM_001428 Enolase 1, (alpha) ENO1 357 9816 NM_006098 Guanine nucleotidebinding protein (G protein), beta GNB2L1 45 8910 polypeptide 2-like 1NM_002032 Ferritin, heavy polypeptide 1 FTH1 138 8861 NM_002654 Pyruvatekinase, muscle PKM2 643 7413 NM_004048 Beta-2-microglobulin B2M 568 7142NM_006597 Heat shock 70 kDa protein 8 HSPA8 258 6066 NM_000034 AldolaseA, fructose-bisphosphate ALDOA 252 5703 NM_021009 Ubiquitin C UBC 675579 NM_006013 Ribosomal protein L10 RPL10 1,503 5572 NM_012423Ribosomal protein L13a RPL13A 509 5552 NM_007355 Heat shock 90 kDaprotein 1, beta HSPCB 309 5436 NM_004046 ATP synthase, H+ transporting,mitochondrial F1 complex, ATP5A1 164 5434 alpha subunit, isoform 1,cardiac muscle NM_000516 GNAS complex locus GNAS 362 4677 NM_001743Calmodulin 2 (phosphorylase kinase, delta) CALM2 611 4306 NM_005566Lactate dehydrogenase A LDHA 566 4186 NM_000973 Ribosomal protein L8RPL8 92 4042 NM_002948 Ribosomal protein L15 RPL15 1,368 3861 NM_000977Ribosomal protein L13 RPL13 424 3774 NM_002952 Ribosomal protein S2 RPS286 3758 NM_005507 Cofilin 1 (non-muscle) CFL1 508 3616 NM_004039 AnnexinA2 ANXA2 294 3560 NM_021019 Myosin, light polypeptide 6, alkali, smoothmuscle and non-muscle MYL6 209 3512 NM_002300 Lactate dehydrogenase BLDHB 230 3501 NM_003217 Testis enhanced gene transcript (BAXinhibitor 1) TEGT 1,847 3438 NM_002568 Poly(A) binding protein,cytoplasmic 1 PABPC1 445 3241 NM_001015 Ribosomal protein S11 RPS11 853220 NM_003973 Ribosomal protein L14 RPL14 156 3198 NM_000969 Ribosomalprotein L5 RPL5 78 3167 NM_007104 Ribosomal protein L10a RPL10A 32 3079NM_001642 Amyloid beta (A4) precursor-like protein 2 APLP2 1,364 3002NM_001418 Eukaryotic translation initiation factor 4 gamma, 2 EIF4G2 7912913 NM_002635 Solute carrier family 25 (mitochondrial carrier; SLC25A3197 2900 phosphate carrier), member 3 NM_001009 Ribosomal protein S5RPS5 58 2897 NM_000291 Phosphoglycerate kinase 1 PGK1 1,016 2858NM_001728 Basigin (OK blood group) BSG 769 2827 NM_001658ADP-ribosylation factor 1 ARF1 1,194 2772 NM_001003 Ribosomal protein,large, P1 RPLP1 39 2770 NM_018955 Ubiquitin B UBB 144 2732 NM_005998Chaperonin containing TCP1, subunit 3 (gamma) CCT3 255 2709 NM_001967Eukaryotic translation initiation factor 4A, Isoform 2 EIF4A2 626 2693NM_001469 Thyroid autoantigen 70 kDa (Ku antigen) G22P1 259 2682NM_000918 Procollagen-proline, 2-oxoglutarate 4-dioxygenase (prolineP4HB 868 2659 4-hydroxylase), beta polypeptide (protein disulfideisomerase; thyroid hormone binding protein p55) NM_002574 Peroxiredoxin1 PRDX1 323 2604 NM_001020 Ribosomal protein S16 RPS16 78 2573 NM_007363Non-POU domain containing, octamer-binding NONO 1,119 2557 NM_001022Ribosomal protein S19 RPS19 63 2533 NM_001675 Activating transcriptionfactor 4 (tax-responsive enhancer ATF4 85 2479 element B67) NM_005617Ribosomal protein S14 RPS14 78 2465 NM_001664 Ras homolog gene family,member A RHOA 1,045 2426 NM_005801 Putative translation initiationfactor SUI1 836 2425 NM_000981 Ribosomal protein L19 RPL19 80 2381NM_000979 Ribosomal protein L18 RPL18 49 2362 NM_001026 Ribosomalprotein S24 RPS24 77 2355 NM_000975 Ribosomal protein L11 RPL11 53 2314NM_002117 Major histocompatibility complex, class I, C HLA-C 434 2278NM_004068 Adaptor-related protein complex 2, mu 1 subunit AP2M1 494 2230NM_006429 Chaperonin containing TCP1, subunit 7 (eta) CCT7 164 2216NM_022551 Ribosomal protein S18 RPS18 5,538 2208 NM_001013 Ribosomalprotein S9 RPS9 73 2113 NM_005594 Nascent-polypeptide-associated complexalpha polypeptide NACA 133 2075 NM_001028 Ribosomal protein S25 RPS25 742066 NM_032378 Eukaryotic translation elongation factor 1 delta (guanineEEF1D 76 2051 nucleotide exchange protein) NM_000999 Ribosomal proteinL38 RPL38 50 2007 NM_000994 Ribosomal protein L32 RPL32 64 2003NM_007008 Reticulon 4 RTN4 973 1969 NM_001909 Cathepsin D (lysosomalaspartyl protease) CTSD 834 1940 NM_006325 RAN, member RAS oncogenefamily RAN 892 1906 NM_003406 Tyrosine 3-monooxygenase/tryptophan5-monooxygenase activation YWHAZ 2,013 1892 protein, zeta polypeptideNM_006888 Calmodulin 1 (phosphorylase kinase, delta) CALM1 3,067 1880NM_004339 Pituitary tumor-transforming 1 interacting protein PTTG1IP1,985 1837 NM_005022 Profilin 1 PFN1 289 1787 NM_001961 Eukaryotictranslation elongation factor 2 EEF2 504 1754 NM_003091 Small nuclearribonucleoprotein polypeptides B and B1 SNRPB 295 1735 NM_006826Tyrosine 3-monooxygenase/tryptophan 5-monooxygenase YWHAQ 1,310 1726activation protein, theta polypeptide NM_002140 Heterogeneous nuclearribonucleoprotein K HNRPK 1,227 1725 NM_001064 Transketolase(Wernicke-Korsakoff syndrome) TKT 167 1721 NM_021103 Thymosin, beta 10TMSB10 317 1714 NM_004309 Rho GDP dissociation inhibitor (GDI) alphaARHGDIA 1,206 1702 NM_002473 Myosin, heavy polypeptide 9, non-muscleMYH9 1,392 1692 NM_000884 IMP (Inosine monophosphate) dehydrogenase 2IMPDH2 63 1690 NM_001004 Ribosomal protein, large P2 RPLP2 59 1688NM_001746 Calnexin CANX 2,302 1677 NM_002819 Polypyrimidine tractbinding protein 1 PTBP1 1,561 1663 NM_000988 Ribosomal protein L27 RPL2759 1660 NM_004404 Neural precursor cell expressed, developmentallydown-regulated 5 NEDD5 2,090 1654 NM_005347 Heat shock 70 kDa protein 5(glucose-regulated protein, 78 kDa) HSPA5 1,757 1651 NM_000175 Glucosephosphate isomerase GPI 296 1635 NM_001207 Basic transcription factor 3BTF3 300 1632 NM_003186 Transgelin TAGLN 405 1612 NM_003334Ubiquitin-activating enzyme E1 (A1S9T and BN75 temperature UBE1 199 1590sensitivity complementing) NM_001018 Ribosomal protein S15 RPS15 32 1574NM_003404 Tyrosine 3-monooxygenase/tryptophan 5-monooxygenase YWHAB2,088 1523 activation protein, beta polypeptide NM_003753 Eukaryotictranslation initiation factor 3, subunit 7 zeta, 66/67 kDa EIF3S7 1521509 NM_005762 Tripartite motif-containing 28 TRIM28 193 1507 NM_005381Nucleolin NCL 284 1501 NM_000995 Ribosomal protein L34 RPL34 450 1495NM_002823 Prothymosin, alpha (gene sequence 28) PTMA 720 1462 NM_002415Macrophage migration inhibitory factor (glycosylation-inhibiting MIF 1171459 factor) NM_002128 High-mobility group box 1 HMGB1 1,527 1457NM_006908 Ras-related C3 botulinum toxin substrate 1 (rho family, RAC11,536 1437 small GTP binding protein Rac1) NM_002070 Guanine nucleotidebinding protein (G protein), alpha GNAI2 512 1435 inhibiting activitypolypeptide 2 NM_001997 Finkel-Biskis-Reilly murine sarcoma virus(FBR-MuSV) FAU 68 1428 ubiquitously expressed (fox derived); ribosomalprotein S30 NM_014390 Staphylococcal nuclease domain containing 1 SND1556 1422 NM_014764 DAZ associated protein 2 DAZAP2 1,322 1419 NM_005917Malate dehydrogenase 1, NAD (soluble) MDH1 208 1396 NM_001494 GDPdissociation inhibitor 2 GDI2 785 1395 NM_014225 Protein phosphatase 2(formerly 2A), regulatory subunit A PPP2R1A 472 1391 (PR 65), alphaisoform NM_001660 ADP-ribosylation factor 4 ARF4 858 1382 NM_001823Creatine kinase, brain CKB 206 1381 NM_003379 Villin 2 (ezrin) VIL21,272 1380 NM_000182 Hydroxyacyl-Coenzyme Adehydrogenase/3-ketoacyl-Coenzyme A HADHA 647 1379thiolase/enoyl-Coenzyme A hydratase (trifunctional protein), alphasubunit NM_003746 Dynein, cytoplasmic, light polypeptide 1 DNCL1 2811375 NM_007103 NADH dehydrogenase (ubiquinone) flavoprotein 1, 51 kDaNDUFV1 103 1352 NM_000992 Ribosomal protein L29 RPL29 164 1349 NM_007209Ribosomal protein L35 RPL35 35 1345 NM_006623 Phosphoglyceratedehydrogenase PHGDH 231 1340 NM_002796 Proteasome (prosome, macropain)subunit, beta type, 4 PSMB4 108 1340 NM_002808 Proteasome (prosome,macropain) 26S subunit, non-ATPase, 2 PSMD2 231 1326 NM_000454Superoxide dismutase 1, soluble (amyotrophic lateral sclerosis SOD1 3461323 1 (adult)) NM_003915 RNA binding motif protein 12 RBM12 216 1323NM_004924 Actinin, alpha 4 ACTN4 1,099 1316 NM_006086 Tubulin, beta 3TUBB3 296 1314 NM_001016 Ribosomal protein S12 RPS12 56 1304 NM_003365Ubiquinol-cytochrome c reductase core protein I UQCRC1 126 1303NM_003016 Splicing factor, arginine/serine-rich 2 SFRS2 1,059 1301NM_007273 Represser of estrogen receptor activity REA 332 1281 NM_014610Glucosidase, alpha; neutral AB GANAB 1,652 1280 NM_001749 Calpain, smallsubunit 1 CAPNS1 514 1270 NM_005080 X-box binding protein 1 XBP1 1,0031269 NM_005216 Dolichyl-diphosphooligosaccharide-proteinglycosyltransferase DDOST 616 1268 NM_004640 HLA-B associated transcript1 BAT1 237 1262 NM_021983 Major histocompatibility complex, class II, DRbeta 4 HLA-DRB1 313 1251 NM_013234 Eukaryotic translation initiationfactor 3 subunit k eIF3k 84 1251 NM_004515 Interleukin enhancer bindingfactor 2, 45 kDa ILF2 384 1249 NM_000997 Ribosomal protein L37 RPL37 501244 NM_000801 FK506 binding protein 1A, 12 kDa FKBP1A 1,149 1243NM_000985 Ribosomal protein L17 RPL17 58 1243 NM_001014 Ribosomalprotein S10 RPS10 57 1232 NM_001069 Tubulin, beta 2 TUBB2 194 1230NM_004960 Fusion (involved in t(12; 16) in malignant liposarcoma) FUS166 1197 NM_005165 Aldolase C, fructose-bisphosphate ALDOC 432 1195NM_004930 Capping protein (actin filament) muscle Z-line, beta CAPZB 2591193 NM_000239 Lysozyme (renal amyloidosis) LYZ 1,016 1190 NM_007263Coatomer protein complex, subunit epsilon COPE 263 1179 NM_001861Cytochrome c oxidase subunit IV isoform 1 COX4I1 129 1178 NM_003757Eukaryotic translation initiation factor 3, subunit 2 beta, 36 kDaEIF3S2 408 1169 NM_005745 B-cell receptor-associated protein 31 BCAP31438 1166 NM_002743 Protein kinase C substrate 80K-H PRKCSH 337 1158NM_004161 RAB1A, member RAS oncogene family RAB1A 638 1115 NM_002080Glutamic-oxaloacetic transaminase 2, mitochondrial GOT2 1,039 1114(aspartate aminotransferase 2) NM_005731 Actin related protein 2/3complex, subunit 2, 34 kDa ARPC2 448 1113 NM_006445 PRP8 pre-mRNAprocessing factor 8 homolog (yeast) PRPF8 173 1110 NM_001867 Cytochromec oxidase subunit VIIc COX7C 168 1106 NM_002375 Microtubule-associatedprotein 4 MAP4 1,164 1102 NM_003145 Signal sequence receptor, beta(translocon-associated protein beta) SSR2 492 1099 NM_001788 CDC10 celldivision cycle 10 homolog (S. cerevisiae) CDC10 1,015 1094 NM_006513Seryl-tRNA synthetase SARS 323 1085 NM_003754 Eukaryotic translationinitiation factor 3, subunit 5 epsilon, 47 kDa EIF3S5 152 1081 NM_005112WD repeat domain 1 WDR1 845 1080 NM_004893 H2A histone family, member YH2AFY 635 1072 NM_004494 Hepatoma-derived growth factor (high-mobilitygroup protein 1-like) HDGF 1,339 1069 NM_001436 Fibrillarin FBL 111 1069NM_003752 Eukaryotic translation initiation factor 3, subunit 8, 110 kDaEIF3S8 201 1060 NM_003321 Tu translation elongation factor,mitochondrial TUFM 207 1038 NM_001119 Adducin 1 (alpha) ADD1 1,569 1037NM_005273 Guanine nucleotide binding protein (G protein), betapolypeptide 2 GNB2 386 1030 NM_006755 Transaldolase 1 TALDO1 256 1026NM_023009 MARCKS-like 1 MARCKSL1 774 1014 NM_002799 Proteasome (prosome,macropain) subunit, beta type, 7 PSMB7 162 1012 NM_002539 Omithinedecarboxylase 1 ODC1 343 1009 NM_006801 KDEL (Lys-Asp-Glu-Leu)endoplasmic reticulum protein retention KDELR1 742 1007 receptor 1NM_014944 Calsyntenin 1 CLSTN1 1,481 1003 NM_007262 Parkinson disease(autosomal recessive, early onset) 7 PARK7 253 1002 The above list ofthe accession number for housekeeping genes was obtained from Eisenbergand Levanon (2003). The accession numbers were used as input for a PERL(Programmed Extraction Report Language) computer program that extractsEST data from the Unigene database. The Unigene database was downloadedas a text file from the NCBI website. The length of the 3′UTR wasderived by computationally extracting the 3′UTR (Bakheet al, 2001).^(a)is a commonly used abbreviation of the gene product; ^(b)is thelength of the 3′UTR; ^(c)is the number of ESTs.

Example 2

A standard pUC 19 based expression vector containing green fluorescenceprotein (GFP) was used, which contains the human cytomegalovirus (CMV)immediate early promoter, the coding region of enhanced GFP (EGFP) geneand 3′UTR of bovine growth hormone, and ter contains the polyA signal.Suitable expression vectors could, for example, be purchased fromInvitrogen, Clontech, Invivogen, Gene Therapy Systems, and Promega, Inc.A PCR product derived from a portion of the 3′UTR of the humanelongation factor alpha 1, EEF1A, was generated using a for ward primerthat contains a BamHI restriction site at the 5′ terminus of the primer(SEQ ID NO: 1) and a reverse primer that contains a XbaI restrictionsite at the 5′ terminus of the primer (SEQ ID NO: 2). The sequences ofthe primers are as follows:

     BamHI GCACCGGATCCAATATTATCCCTAATACCTG      XbaIGCCAGTCTAGAAATAACTTAAAACTGCCA

The primers amplify a 210 bp region 1461-1680 in EEF1A which has theaccession number NM_(—)001402. The PCR product was cut using restrictionenzymes BamHI and XbaI (New England BioLabs, NEB). Briefly, 10 μg of thePCR product was digested with 10 units of XbaI in a buffer containing0.1 μg/ml BSA for 1 hr at 37° C. followed by digestion with BamHI inBamHI buffer for an additional 1 hr at 37° C. The digested PCR productswere extracted using a phenol/chloroform method followed by ethanolprecipitation. The PCR region was cloned into the GFP vector that waspreviously digested with the same restriction enzymes (BamHI and XbaI)and previously purified using phenol/chloroform extraction and ethanolprecipitation. The BamHI and XbaI sites are located downstream of theend of the EGFP coding region and upstream of the polyA signal (FIG.1A), Cloning of the PCR products into the vector was achieved using aligation reaction: 30 ng of digested vector DNA is mixed with 90 ng ofdigested PCR products in a 10 μl reaction containing T4 DNA ligase. Theligated products were used to transform competent DH5α E. coli cellsfollowed by expansion of the resulting colonies in bacterial culturemedium. The recombinant DNA was extracted using the Qiagen plasmidpurification kit (Qiagen, Germany). The sizes of the vectors harboringthe inserts were verified using gel electrophoresis. The size of thevector without the modified 3′UTR is 5757 bases and the size of thevector with the modified 3′UTR is 5933 bases.

The recombinant hybrid 3′UTR sequence (˜400 bp) was searched againstNCBI human genome databases to search for the best homology and found tocontain the following: a portion of the human EEFA1 and a portion ofbovine growth hormone bGH) containing polyA signal. The sequence of theentire hybrid 3′UTR is given in Table 2 (SEQ ID NO: 3).

TABLE 2 Sequence of the efficient hybrid 3′UTRggatccaaatattatccctaatacctgccaccccactcttaatcagtggtggaagaacggtctcagaactgtttgtttcaattggccatttaagtttagtagtaaaagactggttaatgataacaatgcatcgtaaaaccttcagaaggaaaggagaatgttttgtggaccactttggttttcttttttgcgtgtggcagttttaagttattctctagagatctgtgtgttggttttttgtggatctgctgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccctggaaggtgccactcccactgtcctttcctaataaaatgaggaaattgcatcgcattgtctgagtaggtgtcattctattctggggggtggggtggggcagcacagcaagggggaggattgggaagacaatagcaggcatgctggggatgcggtgggctctatgggta The polyA site (AATAAA) isunderlined.

The resultant expression vector with the hybrid 3′UTRs were used forfunctional studies to confirm the expression of the encoded protein,namely GFP. The HEK 293 cell line was used for transfection. HEK293cells were grown at standard culture conditions (37° C., 5% CO₂) in RPMI1788 medium supplemented with 10% PBS and antibiotics (GIBCO BRL,Gaithersburg, Md.). 3×10⁴ cells per well in 96-well plates weretransfected with 1 μg of the original (unmodified) expression vector orthe modified vector. Transfections were performed in serum-free mediumusing LipofectinAmine 2000 (Gibco) for 5 h, followed by replacing mediumwith serum-supplemented medium. After approximately 48 hours, the cellswere visualized using a fluorescence microscope and the optimalexcitation wavelength for GFP of 488 nm and emission wavelength of 503nm. Images were captured using a camera mounted on top of themicroscope. The images were read by an algorithm that quantitates totalfluorescence intensities in pixels. FIG. 2 shows exemplary images andthe fluorescence results of these images. The modified vector, i.e. thevector with the hybrid 3′UTR, resulted in an approximately 3-foldincrease in the protein expression as evaluated by the GFP expression.

Example 3

The expression vector with the hybrid 3′UTR was used in two types ofcell lines, HeLa cells which is a cervical cell line (FIGS. 3A and B),and Huh7 which is a liver cell line (FIGS. 3C and D). The cells weregrown at standard culture conditions (37° C., 5% CO₂) in DMEM mediumsupplemented with 10% FBS and antibiotics (Gibco BRL, Gaithersburg,Md.). 3×10⁴ cells per well in 96-well plates were transfected with 1 μgof the original expression vector (FIGS. 3A and C) or the modifiedvector FIGS. 3B and D). Transfections were performed in serum-freemedium using LipofectinAmine 2000 (Gibco) for 5 h, followed by replacingmedium with serum-supplemented medium. After approximately 48 hours, thecells were visualized using a fluorescence microscope and the optimalexcitation wavelength for GFP of 488 nm and emission wavelength of 503nm. Images were captured using a camera mounted on top of themicroscope. The images were read by an algorithm that quantitated totalfluorescence intensities in pixels. FIG. 3 shows the results of thequantitation. The modified vector, i.e. the vector with the hybrid3′UTR, resulted in an approximately 5-fold increase in the proteinexpression in HeLa cell line and more than 15-fold increase in proteinexpression in Huh7 cell line (as evaluated by the GFP expression).

Example 4

A PCR product generated from the modified vector using primers flankingthe CMV promoter and the hybrid 3′UTR still leads to efficienttransfection and optimum protein expression. HEK293 cells in 3×10⁴ cellsper well in 96-well plates were transfected with 1 μg of the originalexpression vector (FIG. 4A) or with said PCR product generated from themodified vector (FIG. 4B). Transfections were performed in serum-freemedium using LipofectinAmine 2000 (Gibco) for 5 h, followed by replacingmedium with serum-supplemented medium. After approximately 72 hours, thecells were visualized using a fluorescence microscope and the optimalexcitation wavelength for GFP of 488 nm and emission wavelength of 503nm. Images were captured using a camera mounted on top of themicroscope. The images were read by an algorithm that quantitates totalfluorescence intensities in pixels. FIG. 4 also shows the results of thequantitations showing that the GFP expression resulting from the PCRproduct which contains the hybrid 3′UTR is still as effective as the GFPexpression resulting from the original vector.

REFERENCES

-   Bakheet, T., Frevel, M., Williams, B R, and K. S. Khabar, 2001.    APED: Human AU-rich element-containing mRNA database reveals    unexpectedly diverse functional repertoire of encoded proteins.    Nucleic Acids Research. 29:246-254.-   Chen, C. Y. and A. D. Shyu. 1994. Selective degradation of    early-response-gene mRNAs: functional analyses of sequence features    of the AU-rich elements. Mol Cell Biol 14: 8471-8482.-   Dalphin, M. E., P. A. Stockwell, W. P, Tate, and C. M. Brown. 1999,    TransTerm, the translational signal database, extended to include    full coding sequences and untranslated regions. Nucleic Acids Res    27: 293-294.-   Eisenberg, E. and E. Y. Levanon. 2003. Human housekeeping genes are    compact. Trends Genet. 19(7): 362-365.-   Foecking, M. K. and H. Hofstetter. 1986. Powerful and versatile    enhancer-promoter unit for mammalian expression vectors. Gene 45:    101-105.-   Goodwin B. C. and F. M. Rottman. 1992. The 3′-flanking sequence of    the bovine growth hormone gene contains novel elements required for    efficient and accurate polyadenylation. J Biol Chem 267:    16330-16334.-   Kobayashi, M., A. Tanaka, Y. Hayashi, and S. Shimamura. 1997. The    CMV enhancer stimulates expression of foreign genes from the human    BF-1 alpha promoter. Anal Biochem 247: 179-181.-   Legendre, M. and D. Gautheret. 2003. Sequence determinants in human    polyadenylation site selection. BMC Genomics 4: 7.-   Shaw, G. and R. Kramen. 1986. A conserved AU sequence from the 31    untranslated region of GM-CSF mRNA mediates selective mRNA    degradation. Cell 46: 659-667.-   Wigley, P. L., M. D. Sheets, D. A. Zarkower, M. E. Whitner, and M.    Wickens. 1990. Polyadenylation of mRNA: minimal substrates and a    requirement for the 2′ hydroxyl of the U in AAUAAA. Mol Cell Biol    10: 1705-1713.

1. A recombinant nucleic acid comprising a first and a second nucleicacid region, wherein (a) said first region is derived from an 3′untranslated region (3′UTR) of a stable eukaryotic mRNA, (b) said firstregion does not contain a polyadenylation signal, (c) said second regionis derived from the downstream end of an 3, untranslated region ofanother or the same stable eukaryotic mRNA as in (a), and (d) saidsecond region contains a polyadenylation signal.
 2. The recombinantnucleic acid of claim 1, wherein said first region and said secondregion are in proximity to each other or are adjacent to each other oroverlap with each other or one encompasses the other.
 3. The recombinantnucleic acid of claim 2, wherein the 3′ end of the first region isadjacent to the 5′ end of the second region or the 3′ end of the secondregion is adjacent to the 5′ end of the first region or the first regionis located with the second region, preferably upstream of thepolyadenylation signal of the second region.
 4. The recombinant nucleicacid of any of claims 1 to 3, wherein the first and/or second region isderived from the 3′UTR of housekeeping genes, such as β-actin, α- andβ-globins, glyceraldehyde 3-phosphate dehydrogenase (GAPDH), growthhormone, ribosomal proteins or eukaryotic translation elongation factor1 alpha 1 (EEF1A1).
 5. The recombinant nucleic acid of any of claims 1to 4, wherein said first region is the 3′ UTR of eukaryotic translationelongation factor 1 alpha 1 (EEF1A1) or is a part thereof.
 6. Arecombinant nucleic acid comprising the following components in 5, to 3′direction: (a) a promoter sequence, (b) a protein coding regionsequence, and (c) the hybrid 3′UTR according to any of claims 1 to 5,which is operably linked to the protein coding sequence of (b).
 7. Therecombinant nucleic acid of any of claims 1 to 6, wherein the nucleicacid is DNA, RNA, or PNA.
 8. The nucleic acid according to claim 6 or 7,wherein the nucleic acid is a linear DNA molecule and is generated byPCR using at least two primers that specifically hybridize to regionsnear the 5′ end and specifically hybridize to regions near the 3′ end ofthe nucleic acid according to claim 6 or 7 which is comprised in anexpression vector.
 9. An expression vector comprising the nucleic acidaccording to any of claims 1 to
 7. 10. The expression vector accordingto claim 9 further comprising a selection marker.
 11. The expressionvector according to claim 9 or 10, wherein the protein coding sequenceis coding for a reporter protein or a therapeutic protein.
 12. A methodof producing a nucleic acid according to any of claims 1 to 8 in linearform, comprising the following steps of (a) providing an expressionvector according to any of claims 9 to 11 comprising a nucleic acidaccording to claim 6 or 7, (b) providing at least two primers, whereinone primer specifically hybridizes to regions near the 5′ end of thenucleic acid according to claim 6 or 7, and wherein the other primerspecifically hybridizes to regions near the 3′ end of the nucleic acidaccording to claim 6 or 7, (c) performing a PCR, and (d) obtaining theamplification product of step (c).
 13. A host cell characterized in thatit contains the expression vector of any of claims 9 to 11 or therecombinant nucleic acid of claim 8, and further characterized in thatit transiently or stably expresses the protein encoded in the expressionvector of any of claims 9 to 11 or the recombinant nucleic acid of claim8.
 14. The host cell according to claim 13 which is obtained by in vivoinjection of the expression vector of any of claims 9 to 11 or therecombinant nucleic acid of claim 8 into a cell.
 15. A method forobtaining a host cell according to claim 13, comprising the followingsteps of (a) providing an expression vector according to any of claims 9to 11 or a nucleic acid according to claim 8, (b) in vivo injection ofthe expression vector according to any of claims 9 to 11 or therecombinant nucleic acid of claim 8 into cells, (c) obtaining theinjected cells of step (b).
 16. A method for expressing proteins,comprising providing and culturing of cells according to claim 13 or 14,under conditions allowing transient or stable expression of proteins,and obtaining said expressed proteins.